110 research outputs found

    Towards an Achievable Performance for the Loop Nests

    Full text link
    Numerous code optimization techniques, including loop nest optimizations, have been developed over the last four decades. Loop optimization techniques transform loop nests to improve the performance of the code on a target architecture, including exposing parallelism. Finding and evaluating an optimal, semantic-preserving sequence of transformations is a complex problem. The sequence is guided using heuristics and/or analytical models and there is no way of knowing how close it gets to optimal performance or if there is any headroom for improvement. This paper makes two contributions. First, it uses a comparative analysis of loop optimizations/transformations across multiple compilers to determine how much headroom may exist for each compiler. And second, it presents an approach to characterize the loop nests based on their hardware performance counter values and a Machine Learning approach that predicts which compiler will generate the fastest code for a loop nest. The prediction is made for both auto-vectorized, serial compilation and for auto-parallelization. The results show that the headroom for state-of-the-art compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to 1.71x for the auto-parallelized code. These results are based on the Machine Learning predictions.Comment: Accepted at the 31st International Workshop on Languages and Compilers for Parallel Computing (LCPC 2018

    Exploiting Anti-monotonicity of Multi-label Evaluation Measures for Inducing Multi-label Rules

    Full text link
    Exploiting dependencies between labels is considered to be crucial for multi-label classification. Rules are able to expose label dependencies such as implications, subsumptions or exclusions in a human-comprehensible and interpretable manner. However, the induction of rules with multiple labels in the head is particularly challenging, as the number of label combinations which must be taken into account for each rule grows exponentially with the number of available labels. To overcome this limitation, algorithms for exhaustive rule mining typically use properties such as anti-monotonicity or decomposability in order to prune the search space. In the present paper, we examine whether commonly used multi-label evaluation metrics satisfy these properties and therefore are suited to prune the search space for multi-label heads.Comment: Preprint version. To appear in: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2018. See http://www.ke.tu-darmstadt.de/bibtex/publications/show/3074 for further information. arXiv admin note: text overlap with arXiv:1812.0005

    Finding a short and accurate decision rule in disjunctive normal form by exhaustive search

    Get PDF
    Greedy approaches suffer from a restricted search space which could lead to suboptimal classifiers in terms of performance and classifier size. This study discusses exhaustive search as an alternative to greedy search for learning short and accurate decision rules. The Exhaustive Procedure for LOgic-Rule Extraction (EXPLORE) algorithm is presented, to induce decision rules in disjunctive normal form (DNF) in a systematic and efficient manner. We propose a method based on subsumption to reduce the number of values considered for instantiation in the literals, by taking into account the relational operator without loss of performance. Furthermore, we describe a branch-and-bound approach that makes optimal use of user-defined performance constraints. To improve the generalizability we use a validation set to determine the optimal length of the DNF rule. The performance and size of the DNF rules induced by EXPLORE are compared to those of eight well-known rule learners. Our results show that an exhaustive approach to rule learning in DNF results in significantly smaller classifiers than those of the other rule learners, while securing comparable or even better performance. Clearly, exhaustive search is computer-intensive and may not always be feasible. Nevertheless, based on this study, we believe that exhaustive search should be considered an alternative for greedy search in many problems

    Iso-osmotic regulation of nitrate accumulation in lettuce (Lactuca sativa L.)

    Get PDF
    Concerns about possible health hazards arising from human consumption of lettuce and other edible vegetable crops with high concentrations of nitrate have generated demands for a greater understanding of processes involved in its uptake and accumulation in order to devise more sustainable strategies for its control. This paper evaluates a proposed iso-osmotic mechanism for the regulation of nitrate accumulation in lettuce (Lactuca sativa L.) heads. This mechanism assumes that changes in the concentrations of nitrate and all other endogenous osmotica (including anions, cations and neutral solutes) are continually adjusted in tandem to minimise differences in osmotic potential of the shoot sap during growth, with these changes occurring independently of any variations in external water potential. The hypothesis was tested using data from six new experiments, each with a single unique treatment comprising a separate combination of light intensity, N source (nitrate with or without ammonium) and nitrate concentration carried out hydroponically in a glasshouse using a butterhead lettuce variety. Repeat measurements of plant weights and estimates of all of the main soluble constituents (nitrate, potassium, calcium, magnesium, organic anions, chloride, phosphate, sulphate and soluble carbohydrates) in the shoot sap were made at intervals from about 2 weeks after transplanting until commercial maturity, and the data used to calculate changes in average osmotic potential in the shoot. Results showed that nitrate concentrations in the sap increased when average light levels were reduced by between 30 and 49 % and (to a lesser extent) when nitrate was supplied at a supra-optimal concentration, and declined with partial replacement of nitrate by ammonium in the external nutrient supply. The associated changes in the proportions of other endogenous osmotica, in combination with the adjustment of shoot water content, maintained the total solute concentrations in shoot sap approximately constant and minimised differences in osmotic potential between treatments at each sampling date. There was, however, a gradual increase in osmotic potential (ie a decline in total solute concentration) over time largely caused by increases in shoot water content associated with the physiological and morphological development of the plants. Regression analysis using normalised data (to correct for these time trends) showed that the results were consistent with a 1:1 exchange between the concentrations of nitrate and the sum of all other endogenous osmotica throughout growth, providing evidence that an iso-osmotic mechanism (incorporating both concentration and volume regulation) was involved in controlling nitrate concentrations in the shoot

    Multiple Imputation Ensembles (MIE) for dealing with missing data

    Get PDF
    Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases

    A vision for incorporating human mobility in the study of human-wildlife interactions

    Get PDF
    As human activities increasingly shape land- and seascapes, understanding human-wildlife interactions is imperative for preserving biodiversity. Habitats are impacted not only by static modifications, such as roads, buildings and other infrastructure, but also by the dynamic movement of people and their vehicles occurring over shorter time scales. While there is increasing realization that both components of human activity significantly affect wildlife, capturing more dynamic processes in ecological studies has proved challenging. Here, we propose a novel conceptual framework for developing a ‘Dynamic Human Footprint’ that explicitly incorporates human mobility, providing a key link between anthropogenic stressors and ecological impacts across spatiotemporal scales. Specifically, the Dynamic Human Footprint integrates a range of metrics to fully acknowledge the time-varying nature of human activities and to enable scale-appropriate assessments of their impacts on wildlife behavior, demography, and distributions. We review existing terrestrial and marine human mobility data products and provide a roadmap for how these could be integrated and extended to enable more comprehensive analyses of human impacts on biodiversity in the Anthropocene

    On Evaluating MHC-II Binding Peptide Prediction Methods

    Get PDF
    Choice of one method over another for MHC-II binding peptide prediction is typically based on published reports of their estimated performance on standard benchmark datasets. We show that several standard benchmark datasets of unique peptides used in such studies contain a substantial number of peptides that share a high degree of sequence identity with one or more other peptide sequences in the same dataset. Thus, in a standard cross-validation setup, the test set and the training set are likely to contain sequences that share a high degree of sequence identity with each other, leading to overly optimistic estimates of performance. Hence, to more rigorously assess the relative performance of different prediction methods, we explore the use of similarity-reduced datasets. We introduce three similarity-reduced MHC-II benchmark datasets derived from MHCPEP, MHCBN, and IEDB databases. The results of our comparison of the performance of three MHC-II binding peptide prediction methods estimated using datasets of unique peptides with that obtained using their similarity-reduced counterparts shows that the former can be rather optimistic relative to the performance of the same methods on similarity-reduced counterparts of the same datasets. Furthermore, our results demonstrate that conclusions regarding the superiority of one method over another drawn on the basis of performance estimates obtained using commonly used datasets of unique peptides are often contradicted by the observed performance of the methods on the similarity-reduced versions of the same datasets. These results underscore the importance of using similarity-reduced datasets in rigorously comparing the performance of alternative MHC-II peptide prediction methods
    • 

    corecore